Search CORE

24 research outputs found

Do you pay for Privacy in Online learning?

Author: Ramponi Giorgia
Sanyal Amartya
Publication venue
Publication date: 10/10/2022
Field of study

Online learning, in the mistake bound model, is one of the most fundamental concepts in learning theory. Differential privacy, instead, is the most widely used statistical concept of privacy in the machine learning community. It is thus clear that defining learning problems that are online differentially privately learnable is of great interest. In this paper, we pose the question on if the two problems are equivalent from a learning perspective, i.e., is privacy for free in the online learning framework?Comment: This is an updated version with i) clearer problem statements especially in proposed Theorem 1 and ii) clearer discussion of existing work especially Golowich and Livni (2021). Conference on Learning Theory. PMLR, 202

arXiv.org e-Print Archive

Inverse Reinforcement Learning from a Gradient-based Learner

Author: Drappo Gianluca
Ramponi Giorgia
Restelli Marcello
Publication venue
Publication date: 01/01/2020
Field of study

Inverse Reinforcement Learning addresses the problem of inferring an expert's reward function from demonstrations. However, in many applications, we not only have access to the expert's near-optimal behavior, but we also observe part of her learning process. In this paper, we propose a new algorithm for this setting, in which the goal is to recover the reward function being optimized by an agent, given a sequence of policies produced during learning. Our approach is based on the assumption that the observed agent is updating her policy parameters along the gradient direction. Then we extend our method to deal with the more realistic scenario where we only have access to a dataset of learning trajectories. For both settings, we provide theoretical insights into our algorithms' performance. Finally, we evaluate the approach in a simulated GridWorld environment and on the MuJoCo environments, comparing it with the state-of-the-art baseline

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Active Exploration for Inverse Reinforcement Learning

Author: Krause Andreas
Lindner David
Ramponi Giorgia
Publication venue
Publication date: 22/08/2023
Field of study

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert's reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies.Comment: Presented at Conference on Neural Information Processing Systems (NeurIPS), 202

arXiv.org e-Print Archive

Provably Learning Nash Policies in Constrained Markov Potential Games

Author: Alatur Pragnya
He Niao
Krause Andreas
Ramponi Giorgia
Publication venue
Publication date: 13/06/2023
Field of study

Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collisions (safety). Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs can be found via constrained optimization. One tempting approach is to solve it by Lagrangian-based primal-dual methods. As we show, in contrast to the single-agent setting, however, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To solve the CMPG problem, we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs, and, which under additional assumptions, guarantee safe exploration.Comment: 30 page

arXiv.org e-Print Archive

Learning in Non-Cooperative Configurable Markov Decision Processes

Author: Concetti Alessandro
Metelli Alberto Maria
Ramponi Giorgia
Restelli Marcello
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2021
Field of study

The Configurable Markov Decision Process framework includes two entities: a Reinforcement Learning agent and a configurator that can modify some environmental parameters to improve the agent's performance. This presupposes that the two actors have the same reward functions. What if the configurator does not have the same intentions as the agent? This paper introduces the Non-Cooperative Configurable Markov Decision Process, a setting that allows having two (possibly different) reward functions for the configurator and the agent. Then, we consider an online learning problem, where the configurator has to find the best among a finite set of possible configurations. We propose two learning algorithms to minimize the configurator's expected regret, which exploits the problem's structure, depending on the agent's feedback. While a naive application of the UCB algorithm yields a regret that grows indefinitely over time, we show that our approach suffers only bounded regret. Furthermore, we empirically show the performance of our algorithm in simulated domains

Archivio istituzionale della ricerca - Politecnico di Milano

Inverse Reinforcement Learning from a Gradient-based Learner

Author: Gianluca Drappo
Giorgia Ramponi
Marcello Restelli
Publication venue
Publication date: 01/01/2020
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Active Exploration for Inverse Reinforcement Learning

Author: Krause Andreas
Lindner David
Ramponi Giorgia
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2022
Field of study

Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferring a reward function from expert demonstrations. Many IRL algorithms require a known transition model and sometimes even a known expert policy, or they at least require access to a generative model. However, these assumptions are too strong for many real-world applications, where the environment can be accessed only through sequential interaction. We propose a novel IRL algorithm: Active exploration for Inverse Reinforcement Learning (AceIRL), which actively explores an unknown environment and expert policy to quickly learn the expert’s reward function and identify a good policy. AceIRL uses previous observations to construct confidence intervals that capture plausible reward functions and find exploration policies that focus on the most informative regions of the environment. AceIRL is the first approach to active IRL with sample-complexity bounds that does not require a generative model of the environment. AceIRL matches the sample complexity of active IRL with a generative model in the worst case. Additionally, we establish a problem-dependent bound that relates the sample complexity of AceIRL to the suboptimality gap of a given IRL problem. We empirically evaluate AceIRL in simulations and find that it significantly outperforms more naive exploration strategies

Repository for Publications and Research Data

Demo: JoyTag: a battery-less videogame controller exploiting RFID backscattering

Author: Ganesan Deepak
Maselli Gaia
Piva Mauro
Ramponi Giorgia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

This demo presents our experiences in developing a joystick for videogames that uses RFID backscattering for battery-free operation. Specifically, we develop a system to gather data from a wireless and battery-less joystick, named JoyTag, while it interacts with a videogame console. Our system enables consumers to use JoyTag at every moment without caring about charging

Archivio della ricerca- Università di Roma La Sapienza